F0 transformation within the voice conversion framework
نویسندگان
چکیده
In this paper, several experiments on F0 transformation within the voice conversion framework are presented. The conversion system is based on a probabilistic transformation of line spectral frequencies and residual prediction. Three probabilistic methods of instantaneous F0 transformation are described and compared. Moreover, a new modification of inter-speaker residual prediction is proposed which utilizes the information on target F0 directly during the determination of suitable residuum. Preference listening tests confirmed that this modification outperformed the standard version of residual prediction.
منابع مشابه
Voice Conversion Based on Probabilistic Parameter Transformation and Extended Inter-speaker Residual Prediction
Voice conversion is a process which modifies speech produced by one speaker so that it sounds as if it is uttered by another speaker. In this paper a new voice conversion system is presented. The system requires parallel training data. By using linear prediction analysis, speech is described with line spectral frequencies and the corresponding residua. LSFs are converted together with instantan...
متن کاملText-independent F0 transformation with non-parallel data for voice conversion
In voice conversion, a simple frame-level mean and variance normalization is typically used for fundamental frequency (F0) transformation, which is text-independent and requires no parallel training data. Some advanced methods transform pitch contours instead, but require either parallel training data or syllabic annotations. We propose a method which retains the simplicity and text-independenc...
متن کاملData-driven emotion conversion in spoken English
This paper describes an emotion conversion system that combines independent parameter transformation techniques to endow a neutral utterance with a desired target emotion. A set of prosody conversion methods have been developed which utilise a small amount of expressive training data ( 15 min) and which have been evaluated for three target emotions: anger, surprise and sadness. The system perfo...
متن کاملHierarchical modeling of F0 contours for voice conversion
Voice conversion systems deal with the conversion of a speech signal to sound as if it was uttered by another speaker. The conversion of the spectral features has attracted a lot of research attention but the conversion of pitch, modeling the speakerdependent prosody, is often achieved by just controlling the F0 level and range. However, the detailed prosody, including different linguistic unit...
متن کاملEmotional Voice Conversion Using Neural Networks with Different Temporal Scales of F0 based on Wavelet Transform
An artificial neural network is one of the most important models for training features of voice conversion (VC) tasks. Typically, neural networks (NNs) are very effective in processing nonlinear features, such as mel cepstral coefficients (MCC) which represent the spectrum features. However, a simple representation for fundamental frequency (F0) is not enough for neural networks to deal with an...
متن کامل